A Word Embedding Approach to Identifying Verb-Noun Idiomatic Combinations

نویسندگان

  • Waseem Gharbieh
  • Virendra Bhavsar
  • Paul Cook
چکیده

Verb–noun idiomatic combinations (VNICs) are idioms consisting of a verb with a noun in its direct object position. Usages of these expressions can be ambiguous between an idiomatic usage and a literal combination. In this paper we propose supervised and unsupervised approaches, based on word embeddings, to identifying token instances of VNICs. Our proposed supervised and unsupervised approaches perform better than the supervised and unsupervised approaches of Fazly et al. (2009), respectively. 1 Verb–noun Idiomatic Combinations Much research on multiword expressions (MWEs) in natural language processing (NLP) has focused on various type-level prediction tasks, e.g., MWE extraction (e.g., Church and Hanks, 1990; Smadja, 1993; Lin, 1999) — i.e., determining which MWE types are present in a given corpus (Baldwin and Kim, 2010) — and compositionality prediction (e.g., McCarthy et al., 2003; Reddy et al., 2011; Salehi et al., 2014). However, word combinations can be ambiguous between literal combinations and MWEs. For example, consider the following two usages of the expression hit the roof : 1. I think Paula might hit the roof if you start ironing. 2. When the blood hit the roof of the car I realised it was serious. The first example of hit the roof is an idiomatic usage, while the second is a literal combination.1 MWE identification is the task of determining These examples, and idiomaticity judgements, are taken from Cook et al. (2008). which token instances in running text are MWEs (Baldwin and Kim, 2010). Although there has been relatively less work on MWE identification than other type-level MWE prediction tasks, it is nevertheless important for NLP applications such as machine translation that must be able to distinguish MWEs from literal combinations in context. Some recent work has focused on token-level identification of a wide range of types of MWEs and other multiword units (e.g., Newman et al., 2012; Schneider et al., 2014; Brooke et al., 2014). Many studies, however, have taken a word sense disambiguation–inspired approach to MWE identification (e.g., Birke and Sarkar, 2006; Katz and Giesbrecht, 2006; Li et al., 2010), treating literal combinations and MWEs as different word senses, and have exploited linguistic knowledge of MWEs (e.g., Patrick and Fletcher, 2005; Uchiyama et al., 2005; Hashimoto and Kawahara, 2008; Fazly et al., 2009; Fothergill and Baldwin, 2012). In this study we focus on English verb–noun idiomatic combinations (VNICs). VNICs are formed from a verb with a noun in its direct object position. They are a common and productive type of English idiom, and occur cross-lingually (Fazly et al., 2009). VNICs tend to be relatively lexico-syntactically fixed, e.g., whereas hit the roof is ambiguous between literal and idiomatic meanings, hit the roofs and a roof was hit are most likely to be literal usages. Fazly et al. (2009) exploit this property in their unsupervised approach, referred to as CFORM. They define lexico-syntactic patterns for VNIC token instances based on the noun’s determiner (e.g., a, the, or possibly no determiner), the number of the noun (singular or plural), and the verb’s voice (active or passive). They propose a statistical method for automatically determining a given VNIC type’s canonical idiomatic form, based on the frequency of its usage in these

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Classification of Verb Noun Multi-Word Expression Tokens

We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. Our approach hinges upon the assumption that a literal VNC will have more in common with its component words than an idiomatic one. Commonality is mea...

متن کامل

The VNC-Tokens Dataset

Idiomatic expressions formed from a verb and a noun in its direct object position are a productive cross-lingual class of multiword expressions, which can be used both idiomatically and as a literal combination. This paper presents the VNC-Tokens dataset, a resource of almost 3000 English verb–noun combination usages annotated as to whether they are literal or idiomatic. Previous research using...

متن کامل

Lexical and Grammatical Collocations in Writing Production of EFL Learners

Lewis (1993) recognized significance of word combinations including collocations by presenting lexical approach. Because of the crucial role of collocation in vocabulary acquisition, this research set out to evaluate the rate of collocations in Iranian EFL learners' writing production across L1 and L2. In addition, L1 interference with L2 collocational use in the learner' writing samples was st...

متن کامل

Automatic Acquisition of Knowledge About Multiword Predicates

Human interpretation of natural language relies heavily on cognitive processes involving metaphorical and idiomatic meanings. One area of computational linguistics in which such processes play an important, but largely unaddressed, role is the determination of the properties of multiword predicates (MWPs). MWPs such as give a groan and cut taxes involve metaphorical meaning extensions of highly...

متن کامل

An Analysis of Annotation of Verb-Noun Idiomatic Combinations in a Parallel Dependency Corpus

Valency in the PDTs For Czech PDT-Vallex http://ufal.mff.cuni.cz/lindat/PDT-Vallex.html For English EngVallex: http://ufal.mff.cuni.cz/lindat/EngVallex.html PDTs (ID -> val. frame) LEXICONS to give: ACT(.1) PAT(.4) ADDR(.3) Valency and MWEs Valency: ability of words to combine themselves with other lexical units FGD Valency theory MWEs: “idiosyncratic interpretations that cross word boundaries...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016